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WP99-15 

ABSTRACT 

Censored  and  truncated  regression  models  with  unknown  distribution  are  important  in 
econometrics.  Ttiis  paper  cinaracterizes  the  class  of  all  conditional  moment  restrictions 

that  lead  to  V«  -consistent  estimators  for  these  models.  The  semiparametric  efficiency 
bound  for  each  conditional  moment  restriction  is  derived.  In  the  case  of  a  nonzero 
bound  it  is  shown  how  an  estimator  can  be  constructed,  and  that  an  appropriately 
weighted  version  can  attain  the  efficiency  bound.  These  estimators  also  work  when  the 
disturbance  is  independent  of  the  regressors.  The  paper  discusses  selecting  among 
several  estimators  in  this  case,  as  well  as  methods  of  combining  them  to  improve 
efficiency. 


1.        Introduction 

Censored  and  truncated  regression  models  are  important  for  econometric  data  with  a 
limited  dependent  variable.      Unlike  regression  models  without  censoring  or  truncation, 
consistency  of  maximum  likelihood  estimators  depends  on  the  distributional  specification. 
This  property  has  motivated  a  search  for  estimators  that  are  robust  to  distributional 
assumptions.      This  work  includes  Powell   (1984,   1986a,   1986b),   Newey  (1987,    1989a),   Lee 
(1992,   1993),   Honore  and  Powell   (1994),   and  others. 

In  this  paper  we  characterize  the  class  of  all  conditional  moment  restrictions  that 
lead  to  v'n-consistent  estimators  for  censored  and  truncated  regression.      We  derive 
the  semiparametric  efficiency  bound  for  each  conditional  moment  restriction  and  show  when 
it  is  nonzero.      For  the  nonzero  cases  we  describe  how  an  estimator  can  be  constructed, 
and  show  that  an  appropriately  weighted  version  can  attain  the  semiparametric  bound. 

Because  independence  of  disturbance  and  regressors  will  imply  any  conditional  moment 
restriction,   all  the  estimators  will  work  in  the  independence  case.   We  discuss  how  to 
select  among  several  such  estimators  in  this  case,   as  well  as  methods  of  combining  them 
to  improve  efficiency.     Whether  this  approach  can  be  used  to  attain  the  semiparametric 
efficiency  bound  in  the  independence  case  remains  an  open  question. 

In  relation  to  previous  work,   the  semiparametric  efficiency  bounds  for  the  censored 
case  generalize  results  of  Newey  and  Powell  (1990)  and  for  the  truncated  case  are  new. 
Also,   the  censored  regression  estimators  given  here  are  based  on  a  conditional  moment 
restriction  described  by  Newey  (1989a),   that  generalizes  the  moment  restriction  of  Powell 
(1986a).      Lee  (1992)   considered  construction  of  -/n-consistent  estimators  from  a  special 
case  of  these  moment  conditions.     The  truncated  regression  moment  restriction  is  similar 
to  that  of  Newey   (1987),   and  generalizes  Lee  (1993).      Here  we  dispense  with  the  symmetry 
assumption  that  was  imposed  by  Lee  (1992,   1993).      Also,   we  generalize  previous  results  by 
characterizing  the  entire  class  of  useful  moment  conditions.     This  leads  to  estimators 
that  have  improved  properties  over  those  previously  proposed,   including  asymptotic 
efficiency  and  ease  of  asymptotic  variance  estimation. 
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2.        The  Model 

It  is  convenient  to  describe  the  class  of  models  we  consider  in  terms  of  a  latent 
regression.     Let     m(e)     be  some  scalar  function.     Then  a  latent  regression  equation  with 
a  conditional  moment  restriction  can  be  described  as 


(2.1)  y     =  X'pQ  +  e,  E[m(c)|X]   =  0. 


Each  such  condition  corresponds  to  the  location  restriction     jiiX)  =  0     for     /i(X)     solving 
E[m(E-;_i(X))  |X]  =  0.      For  example,   if     m{c)  =  c     then     c     has  conditional  mean  zero,   while 
if     m(G)  =  l(e>0)-l(e<0)     then     c     has  conditional  median  zero.      Other  specifications  of 
m(e)     correspond  other  location  restrictions,   some  of  which  are  less  familiar  than  the 
median  and  mean. 

Censored  and  truncated  regression  models  are  ones  where     (y  ,X)     is  only  partially 
observed.     For  censored  regression, 

* 

(2.2)  y  =  max{0,y  >;  (Censored  regression). 

For  truncated  regression  we  have 

*  * 

(2.3)  (y  ,X)     only  observed  if     y     >  0;  (Truncated  regression). 

These  models  are  familiar   in  econometrics,   and  we  focus  on  them  to  keep  the  exposition 
relatively  simple.      Our  results  can  be  extended  to  other  models,   including  censored 
regression  where  the  censoring  point  varies  with     X     or  censoring  occurs  above  as  well  as 
below. 

In  the  latent  model,   where     (y  ,X)     is  always  observed,   it  is  well  understood  how  to 
use  a  conditional  moment  restriction  to  estimate     /3   .      Equation  (2.1)   implies  that  for 
any  vector  of  functions     A(X)     the  unconditional  moment  restriction     E[A(X)m(y  -X'/3)]  =  0 
will  be  satisfied  at     /3  =  ^        (assuming  expectations  exist).     This  moment  condition  could 
be  used  to  form  a  generalized  method  of  moments  (GMM)  estimator  in  the  usual  way. 
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However,   consistency  of  a  GMM  estimator  requires  that     (3       be  the  unique  solution  to  the 
moment  equation,   which  is  difficult  to  show.     This  identification  problem  motivates 
obtaining     ^     from  minimizing  a  corresponding  objective  function  that  "integrates  back 
from"     m(e).      It  is  often  easier  to  show  that  such  an  objective  function  has  a  unique 
minimum.     To  this  end,    let     q(e)   =  J~  m(u)du  +  C,     where     C     can  be  any  constant.      Also, 
let     w(X)   £;  0     be  a  weight  function  that  will  be  important  for  the  efficiency  discussion 
below.     Then  the  moment  restriction     E[w(X)Xm(e)]  =  0     will  be  the  first  order  condition 
for     E[w(X)q(y  -X' (3)]     to  have  an  extremum  (min  or  max)   at     /3         Assume  that  the  sign  of 

m(e)     is  chosen  so  that     d(X)  =  aE[m(e+a)  |X]/aa|      ^  ^  0.     Then     E[w(X)d(X)XX' ]     will  be 

* 
positive  semi-definite,   the  necessary  second-order  condition  for     E[w(X)q(y  -X'/3)]     to 

* 
have  a  minimum  at     (3   .     Then     E[w(X)q(y  -X'/3)]     becomes  a  function  whose  minimization 

corresponds  to  the  moment  restriction     E[w(X)Xm(y  -X' 13]]  =  0.     The  sample  analog  to  the 

* 
minimizer  of     E[w(X)q(y  -X'/3)]     is 


(2.4)  ^  =  argminX",w(X.)q(y.    -  X'./3). 

p^i=l        11  1 


The  identification  condition  for  consistency  of  this  estimator  is  that     E[w(X)q(y-X'p)] 

has  a  unique  minimum  at     /3         which  is  easier  to  show  than  that     E[w(X)Xm(y-X'p)]   =  0  has 

a  unique  solution. 

It  turns  out  that  in  censored  and  truncated  models  an  analogous  approach  works  for 
some  conditional  moment  restrictions,   and  that  for  the  rest  no  Vn'-consistent  estimator 
exists.     The  nonexistence  result  will  follow  from  the  form  of  the  semiparametric 
efficiency  bound  for  this  model.     This  bound  is  the  infimum  of  the  information  bounds  for 
(3     for  regular  parametric  submodels   (e.g.   see  Newey,   1990,   for  the  definition  of 
regular).      It  can  often  be  computed  by  a  projection.      Define  the  tangent  set     J     to  be 
the  mean-square  closure  of  the  set  of  all  scores  for  parameters  of  the  distribution  of 
(e,X)     in  parametric  submodels  passing  through  the  truth  and  satisfying  equation  (2.1) 
and  let     S        denote  the  score  for     (3.     Define  the  efficient  score     S     to  be  the  residual 
from  the  mean-square  projection  of     S       on     3",     assuming     J     is  linear.     If     E[SS'  ]     is 
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singular  then  no  \/n-consistent,   regular  estimator  exists,   while  if     E[SS'  ]     is 
nonsingular  then  its  inverse  provides  a  bound  on  the  asymptotic  variance  of  regular 
v^-consistent  estimators.      Here  we  find  that     S     is  zero  except  for  certain  cases  where 
the  moment  condition  leads  to  a  Vn-consistent  estimator  analogous  to  that  in  equation 
(2.4).      We  also  find  that  the  asymptotic  variance  of  this  estimator  is  equal  to  the 
semiparametric  bound  when     w(X)     is  chosen  to  have  a  certain  form. 


3.        Censored  Regression  Models. 

For  censored  regression  any  moment  condition  where     m(c)     is  constant  for  all     e 
small  enough  leads  to  Vn-consistent  estimation.     The  fact  that     m(G)     is  constant  below  a 
certain  value  means  that  when     X'/3     is  large  enough,   the  function     m(y-X'/3)     will  have 
the  same  value  at  the  censored     y     as  the  latent     y,      leading  to  the  conditional  moment 
restriction  being  satisfied  in  the  censored  data.      Figure  1  illustrates  how  this  occurs. 

To  be  precise,   let     I     equal  the  supremum  of  all  points  where     m(c)     is  constant 
below  that  point,   i.e.      I  =  sup{e    :    mlc)   =  m{e)   V  e   :£  e},     where  we  take     £  =  -oo     if  the 
set  is  empty.     Then 

(3.1)  l(X'/3  >  -£)m(y-X'/3)  =  1(-X'P  <  £){l(y  =  0)m(-X'/3)  +  l(y  >  0)m(y  -X'/3)} 

=  l(-X'/3  <  £){l(y     <  0)m(£)  +  l(y     >  0)m(y  -X'/3)}  =  l(X'/3  >  -£)m(y*-X'|3), 

leading  to  the  conditional  moment  restriction 

(3.2)  E[1(X'(3q  >  -l)m{y-X'!3^)\X]  =  l(v>-£)E[m(£)  |X]  =  0, 

where     v  =  X'/3   . 

As  discussed  in  Section  2,    integrating  back  to  an  objective  function  can  lead  to 
better  identification  conditions.      To  integrate  back,   note  that  for  a  scalar     a, 
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(3.3)  l(a  >  -^)m(y  -  a)  =  -dq(y  -  max{a,-£})/dcx, 

except  where     a  =  -I.     This  means  that     E[w(X)X*l(v>-£)m(y-v)]   =  0,      as  implied  by 
equation  (3.2),    is  the  first-order  condition  corresponding  to  minimization  of 
E[w(X)q(y-max{X'/3, -•£})].      An  estimator  based  on  the  sample  analog  of  this  minimization  is 


(3.4)  ^  =  argminp^^j:.^^w(X.)q(y.-max{X'./3 


,-m. 


This  estimator  is  the  extension  of  that  of  equation  (2.4)  to  censored  regression. 

The  moment  condition  of  equation  (3.2)  is  critically  dependent  on     m(e)     being 

constant  for  all     c     small  enough.     Without  this  property,   no  yn-consistent  estimator 

will  exist.      This  result  follows  from  the  form  of  the  semiparametric  information  bound. 

To  derive  that  bound  we  impose  the  following  condition.      Let     p.       denote  the  probability 

X 

distribution  of     X     and     U     denote  Lebesgue  measure. 


Assumption  3.1:      (e.,X'. )     is  i.i.d.   with  distribution  that  is  absolutely  continuous  with 
respect  to     U  x  p.   ,     for     p  -almost  all     X     there  is     f   (c|X)     such  that     f(e|X)  = 

XX  G 

S^  f   (u|X)du     and     E[(l+IIXll^){l+J'[f   (u  |X)^/f  (u  |X)]du}]   <  co,      as  a  function  of     a, 
-00  E  e 

2  2 

E[m(c+a)    |X]     is  bounded  in  a  neighborhood  of  every     a     and     E[m(c)    |X]  >  0     with 

probability  one,     Prob(v  =  -£)   =  0,     and     E[IIXII^d(X)^/E[m(e)^|X]]  <  oo. 

*  2-1 

Let     w  (X)  =   (E[m(E)    |X])     d(X).     We  will  also  impose  Assumption  A.l  of  the  Appendix  on 

the  parametric  submodels. 

Theorem  3.1:     If  Assumptions  3.1  and  A.l  are  satisfied  then  the  efficient  score  is 
S  =  w''(X)X'l(X'  I3^>-Vm(y-X'  (B^). 

If     E[SS'  ]     is  nonsingular  then     (E[SS'  ])         is  the  semiparametric  variance  bound. 


Since  the  efficient  score  is  identically  zero  unless     £     is  finite,     m(e)     being  constant 
below  some  value  is  a  necessary  condition  for  existence  of  a  (regular)  Vn-consistent 
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estimator.      That  is,   there  will  be  no  Vn'-consistent  estimator  unless     p     from  equation 

* 
(3.4)   is  available.      Furthermore,   as  shown  below,   for     w(X)  =  w  (X),     the  asymptotic 

variance  of     /3     will  equal  the  bound.      In  this  sense  there  is  no  additional   information 

available  to  be  used  in  estimation  of     13       other  than  that  used  by     /3. 

This  result  sidesteps  the  identification  question  for     (3   .     Despite  the  lack  of  a 
■/n-consistent  estimator,   it  could  be  that     /3        is  identified  "at  infinity,"  by 
approximate  satisfaction  of     E[m{y-X'/3    )|X]  =  0     for  large  values  of     X'/3    .      Chamberlain 
(1986)  shows  that  this  is  possible  for  the  sample  selection  model  and  similar  reasoning 
may  apply  here. 

To  show  /n-consistency  of     ^     it  is  useful  to  make  additional  assumptions.      Let 

Q  =  E[w(X)d(X)l(v>-£)XX']. 

Assumption  3.2:      E[m(e+£x)|X]   ^  {^)  0     for     a  2:   (:£)  0     and     Q     exists  and  is  nonsingular. 
Also,     /3     €   interior(£),     B     is  compact,     m(E)     is  bounded  and  continuous  almost 
everywhere,      and     w(X)  ^  0     is  bounded. 


This  condition  imposes  a  "single  crossing"  property  for     E[m(c+a)|X],     that  its  sign 
doesn't  change  on  either  side  of     a  =  0.      A  simple  sufficient  condition  for  single 
crossing  is  that     m(e)     is  monotonic  increasing  (i.e.      q(E)     is  convex).     Restricting 
attention  to  bounded     m(E)     does  not  seem  too  stringent,   because  its  lower  tail  must  be 
constant  anyway.     This  restriction  could  be  relaxed  at  the  expense  of  complicating  the 

conditions.      With  this  condition  in  place  we  can  obtain  a  consistency  and  asymptotic 

2  2  -1-1 

normality  result  for  the  estimator.      Let     Z  =  E[w(X)   l(v>-£)m(e)  XX'  ]     and     V  =  Q    EQ     . 

Theorem  3.2:     If  Assumptions  3.1  and  3.2  are  satisfied  then     /n(/3-/3    )  — >  N(0,   V). 
Furthermore,   if     w(X)  =  w'(X)  =  d(X)/E[m(c)^  \X]     then     V  =  (E[SS'  ]f\ 

This  result  also  shows  that  the  weighted  m-estimator  would  attain  the  semiparametric 

* 
efficiency  bound  if  the  weight  was  equal  to     w  (X).      In  this  sense  there  is  no 
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information  lost  from  using  an  estimator  like  that  of  equation   (3.4). 

An  asymptotic  variance  estimator  is  needed  for  large  sample  inference  procedures 
based  on  Theorem  3.2.      This  can  be  formed  in  the  usual  way  as     Q    EQ         for  estimators     Q 
and     Z.      An  estimator  of     S     is  straightforward  to  construct,   as 

S  =  y.",l(v.>-£)w(X.)^m(y.-v.)^X.X'./n, 
^1=1      1  1  1111 


where     v.   =  X'.B.     An  estimator  of     Q     is  more  difficult,   because  it  involves  the 
1  1 

derivative     d(X)  =   dE[m{c+a]\X]/da.     If     m(e)  =    Tm   (u)du  +  C     for  some     m   (e)     that  is 

0    e  e 

continuous  almost  everywhere  and  a  constant     C     then 

Q  =  y." ,l(v.>-£)w(X.)m   (y.-v.)X.X'./n 
^1=1      1  1     e     1     1     1    1 

will  do.      Otherwise,      d(X)     will  need  to  be  approximated.     This  may  be  done  by  a 
numerical  derivative,   as  in 

Qs  =  y;.'^,w(X.)X.X'.  [q(y.-max{v.+5,-£})+q(y.-max{v.-5,-£})-2q(y.-max{v.,-£})]/(5^n). 
5        ^1=1         1111  1  1  1  1  1 

The  following  result  shows  consistency  of  the  corresponding  estimators. 

Theorem  3.3:     Suppose  that  Assumptions  3.1  and  3.2  are  satisfied.     If     m(c)  =  S^rn   (u)du  + 

U    e 

C     and     m   (e)     is  continuous  almost  everywhere  then     Q    ZQ      —^  V.     Also,   if     X  = 

(l,x' )' ,     E[w(X)\\Xl\^]  <  00,     d  -^  0,     and     n^^^5  -^  oo     then     Q'Jzq'J  -^  V. 

o        o 

Imposing  the  sixth  moment  condition  simplifies  the  proof  of  this  result,   although  it 
could  probably  be  weakened.      Also,   it  would  be  useful  to  have  guidelines  for  the  choice 
of     6     in  practice,   but  these  are  beyond  the  scope  of  this  paper. 

Construction  of  an  efficient  semiparametric  estimator,   one  that  attains  the  bound, 

* 
would  require  nonparametric  estimation  of  the  optimal  weight     w  (X).      Such  a  result  would 

generalize  Newey  and  Powell's   (1990)  efficient  estimator  for  censored  regression  with 

zero  median.     Derivation  of  such  an  estimator  is  beyond  the  scope  of  this  paper. 
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As  examples  consider  the  conditional  median  and  mean  zero  cases.      In  the  zero 
conditional  median  case,   the  function     m(e)  =  l(e   ^   0)  -  1(e   <   0)     is  constant  below     0, 
leading  to  the  efficient  score     S  =  2f(0|X)X-l(X'/3   >0)m(y-X'/3    ).      In  the  conditional  mean 
case,   the  function     m{e)   =  e     is  not  constant  below  any     I,     and  hence  the  efficient 
score  is  zero.      Consequently,   no  Vn-consistent  regular  estimator  will  exist  in  this  case. 


4.        Truncated  Regression  Models. 


For  truncated  regression  the  special  characteristic  that  leads  to  a  useful  moment 
condition  is  that     m(e)     is  zero  for  all     c     small  enough.     That  feature  means  that  for 
X'/3     large  enough,      m(y  -X'/3)  =  0     as     y       goes  from     0     to     -co,     so  the  conditional 
distribution  of     m(y-X'/3)     will  be  the  same  in  the  truncated  and  latent  data.      Hence  the 
conditional  moment  restriction     E[m(y-X'/3    )|X]  =  0     will  be  satisfied  for     X'/3        large 
enough.      Figure  2  illustrates  this  condition. 

To  be  precise,    let     k     equal  the  supremum  of  all  points  where     m(e)     is  zero  below 
that  point,    i.e.     k  =  sup{e    :    m{c)   =  0  V  e   :£  c},     where  we  take     k  =  -co     if  the  set  is 

empty.      Let     E  [  •  ]     denote  the  expectation  for  the  latent  model  and     E[  •  ]     the 

* 
expectation  for  the  observed  data,   and     P  (A)     the  probability  of  an  event     A     in  the 

latent  model.      For     n(X)   =  E  [l(y>0)  |  X]     note  that     E[  •  |  X]  =  E  [l(y>0)(  • )  I  X]/n(X). 

Then,   since     l(y:£0)l(v>-fc)m(y-v)  =  0,     we  have 

(4.1)  EinX'  (S^>-k)m(.Y-X' I3^)\X]  =  n(X)"^E  [l(v>-«:)l(y>0)m{y-v)  |X] 

=  n(X)~^E  [l(v>-fc)m{c)|X]  =  n(X)"-^l(v>-^)E  [m(c)|X]  =  0. 


Integrating  back  to  an  objective  function  as  was  done  in  Section  3  leads  to  the  estimator 
(4.2)  p  =  argmin^^^^^^w(X.)q(y.-max{X'.|3,-fc}). 


The  analysis  of  the  properties  of  this  estimator  is  exactly  analogous  to  those  of 
the  censored  regression  case   in  Section  3.      If     m(e)     is  not  zero  below  some  value  then 

no  \/n-consistent  estimator  will  exist,   and  the  semiparametric  efficiency  bound  will 

*  2 

correspond  to  the  asymptotic  variance  of     /3     for     w(X)   =  w  (X)   =  d(X)/E[m(c)    |X].      The 

semiparametric  efficiency  bound  is  given  in  the  following  result: 

Theorem  4.1:     If  Assumption  3.1  is  satisfied  and     P  (y>0)  >  0     then  the  efficient  score 
is 

S  =  w^(X)X-l(X'  l3^>-k)m(y-X'  ^^). 
If     E[SS' ]     is  nonsingular  then     (E[SS' ])         is  the  semiparametric  efficiency  bound. 


Asymptotic  normality  will  also  hold  under  similar  conditions  to  those  in  Section  3.      Here 
let 

Q  =  E[w(X)l(v>-fc)d(X)XX'],     S  =  E[w(X)^l(v>-^)m(E)^XX'],      V  =  Q~-^SQ~^ 

The  following  result  gives  asymptotic  normality. 

*  _,  „ 

Theorem  4.2:     If  Assumptions  3.1  and  3.2  are  satisfied  and     P  (y>0)  >  0     then     vn(/3-p 


-%  N(0,   V].      Furthermore,   if     w(X)  =  w'^(X)  =  d(X)/E[m(c)^ \X]     then     V  =  (E[SS' ])  I 


Here  it  should  be  noted  that     m(c)     being  zero  below  some  value  rules  out     m(c)     being 
monotonic,   so  that  other  primitive  conditions  for  the  "single  crossing"  identification 
condition  of  Assumption  3.2  must  be  found.      One  such  condition  is  that  the  conditional 
distribution  of     c     given     X     is  symmetric  (in     e)     around  zero  with  unimodal  conditional 
density  and  that     m{e)     is  an  odd  function  that  is  nonnegative  for     e  ^  0.     Then,   for 
some  choice  of     C,      -q(c)  =  -J  m(t)dt  -  C     will  be  proportional  to  a  unimodal  symmetric 
density  function,   so  that  as  in  Bierens   (1981)  Lemma  3.2.1,      -E[q(e+a)|X]     is 
proportional  to  a  unimodal  density  function  as  a  function  of     a.     This  unimodality 
property  will  imply  that     E[m(E+a)|X]  =  -a{-E[q(E+a)  |  X]}/aa     is  nonnegative  for     a  >  0, 
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and  give  the  single  crossing  property  of  Assumption  3.2. 

A  consistent  estimator  of  the  asymptotic  variance  can  be  formed  in  a  way  analogous 
to  that  for  censored  regression.      Let 

S  =  7." ,l(v.>-fc)w(X.)^m(y.-v.)^X.X'./n,     Q  =  y;."l(v.>-fc)w(X.)m   (y.-v.)X.X'./n, 
^1=1      1  1  1111  ^^1=1      1  1      e     1     1     1    1 


when     m{e)     is  differentiable.     Otherwise  let 

Q     =  V."  w(X.)X.X'.  [q(y.-max{v.+6,-/tn+q(y.-max{v.-5,-ft})-2q(y.-max{v.,-^»]/{6^n). 
5        ^1=1         1111  1  1  1  1  1 


The  following  result  shows  consistency  of  the  corresponding  estimators. 

Theorem  4.3:     Suppose  that  Assumptions  3.1  and  3.2  are  satisfied.     If     m(c)  =  S^rn   (u)du  + 

"-2 1     p 

C     and     m   (c)     is  continuous  almost  everywhere  then     Q    ZQ      — >  V.     Also,   if     X  = 

(hx' )' ,     E[w(X)l\X\\^]  <  CO,     5^0,     and     n^'^^S  — >  co     then     Q~JtQ~J  -^  V. 

o        o 


The  differentiable  case  is  a  useful  one,   because  no  bandwidth  (i.e.      5)     choice  is 
required  for  these  estimators.      For  the  truncated  case  the  estimators  for  differentiable 
m(e)     seem  to  be  the  first  moment  condition  estimators  for  the  truncated  model  that  avoid 
the  use  of  a  bandwidth  in  estimating  the  asymptotic  variance. 


5.        Independence  of     e     and     X. 

The  sensitivity  of  the  efficiency  bound  results  to  the  form  of  conditional  location 
restriction  is  troubling.     For  instance,  the  parameters  can  be  Vn'-consistently  estimated 
when  the  disturbance  has  conditional  median  zero,   but  not  under  conditional  mean  zero. 
Some  economic  models  do  imply  specific  location  restrictions,   such  as  conditional  mean 
restrictions  in  rational  expectations  models.      Often  though,   we  have  no  have  strong  a 
priori  reasons  for  choosing  one  location  restriction  over  another. 
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Things  are  different  when  the  disturbance  is  independent  of  the  regressors.      In  that 
case,   corresponding  to  each  moment  restriction  should  be  a  location  shift  of     c     that 
satisfies  it.      Consequently,    if  a  constant   is  included  in  the  regressors,   any  conditional 
moment  restriction  will  be  satisfied  at  the  true  slopes  and  some  constant,   including  the 
ones  that  lead  to  v/n-consistent  estimation  of  censored  or  truncated  regression.      Thus, 
any  of  the  estimators  will  be  Vn'-consistent  for  the  slope  coefficients  in  the 
independence  case. 

To  be  specific,   consider  any  function     m(c)     such  that  there  is  some     fi        with 
E[m(e-|u    )]   =  0.      Suppose  that     X  =   (I,x'  )'      includes  a  constant,   and  partition     /3  = 
(/3^,p^)'      conformably  with     X.      Then  for     p^^  =  P^q+M^^     and     J3^  =   (p^^.p^^)', 

(5.1)  E[m(y*-X'pQ)|X]   =  E[m(e-fi^)  |  X]  =  E[m(e-)Li^)]  =  0. 

So,   a  conditional  moment  restriction  is  satisfied  in  the  latent  data.     Consequently,   any 

of  the  estimators  we  have  considered  for  the  censored  and  truncated  regression  models 

that  are  v^-consistent  under  some  conditional  moment  restriction  should  be  \/n-consistent 

under  independence,   for  the  slope  parameters     (3      .      The  estimator  of  the  constant  will 

also  be  V^-consistent  for  the  original  constant  plus     u    . 

m 

It  should  be  emphasized  that  the  conditional  moment  restriction  in  equation  (5.1) 
depends  on  independence,   and  not  on  any  other  restriction  on  the  distribution  of  the 
disturbance.      In  particular,   with  independence,   the  symmetry  assumption  of  Lee  (1993)   is 
not  needed  for  the  estimation  of  a  truncated  regression.     Although  symmetry  is  part  of 
the  primitive  conditions  for  the  single  crossing  property  for  truncated  regression  given 
in  Section  4,   it  is  not  the  fundamental  identification  condition  that  leads  to  equation 
(5.1)  being  satisfied. 

When     E     and     X     are  independent,   the  asymptotic  variance  of  the  estimators 

simplifies  and  the  optimal  weight     w(X)     is  equal  to     1.     By  independence,  neither     d(X) 

or     E[m(G)    IX]     depend  on     X.      Let     d      =  5E[m(e-ii    +a]]/aoc\      ^     and     cr^   =  E[m(e-u    )^] 

m  m  a=0  m  m 

Then  for  censored  regression.   Theorem  3.2  will  hold  with     I        as  defined  analogously  to 

m 
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£     in  Section  3  and 

(5.2)  Q  =  d    E[w(X)l(v>-£    )XX' ],     E  =  cr^E[w{X)^l(v>-£    )XX' ]. 

m  m  m  m 

V  =  d"^cT-^(E[w(X)l(v>-£    )XX'])~^E[w(X)^l(v>-£    )XX' ](E[w(X)l(v>-£    )XX' ])"^ 
m     m  m  m  m 


Furthermore,   by  a  standard  result   (see  the  proof  of  Theorem  5.1),     V     is  minimized  (in 
the  positive  semi-definite  semi-order)   at     w(X)  =  1,     where 

(5.3)  V  =  d"^cr^(E[l(v>-£    )XX' ])"^ 

mm  m 

The  components  of  this  asymptotic  variance  can  be  estimated  in  a  similar  way  to  that 
described  in  Section  3.      Let 


(5.4)  a      =  [I      Uv.y-l)m   {y.-v.)]/l     Aiv.y-l), 

m         ^1=1      1  c     1     1      ^1=1      1 

d     ,  =  5"^y."  rq(y.-max{v.+5,-n)+q(y.-max{v.-5,-£})-2q(y.-max{v.,-£})]/y." ,l(v.>-£). 
m6  ^1=1        1  1  1  1  1  1  ^1=1      1 

a^   =  [y.''l(v.>-£)m(y.-v.)^]/^."  !(;.>-£), 
m         ^1=1      1  11        ^1=1      1 

V,  =  d"^J-^(r."  l(v.>-£)X.X'./n)"\     V  =  d~^(?-^(y.",l(v.>-£)X.X'./n)~\ 
6  md   m  ^"1=1      i  i    i  mm  ^i=l      i  i    i 


where     m(E)      is  assumed  to  be  differentiable  in  the  definition  of     d    .     The  estimators 

m 

V        and     V     are  the  analog  for  the  independence  case  of  those  given  in  Section  3  for 
o 

censored  regression.      Replacing     £     by     A;     leads  to  analogous  estimators  for  truncated 
regression.      The  following  result  shows  consistency  of  these  estimators  under 
independence  of     e     and     X: 
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Theorem  5.1:     For  censored  regression,   if  Assumptions  3.1  and  3.2  are  satisfied,     e     and 

X     are   independent,   and     X  -  (l,x' )'      then     Vn(^-p  )  — >  N(0,   V)     for     V     in  equation 

5.2.     Also,   this     V     is  minimized  at     w(X)  =  1.     In  addition,   if     m(c)     is  continuously 

differentiate  with  bounded  derivative  then     V  — ^  V,     while   if     6  — >  0     and     n       5  — >  i 

then     K^  — ^  V.     Furthermore,  for  truncated  regression  the  same  results  hold  with     k 
5 

replacing     £     if     P^(y>0)  >  0. 


Because  many  estimators  will  be  consistent  for  the  slope  coefficients  under 
independence,   it  is  possible  to  choose  from  among  a  group  of  estimators  the  most 
efficient  one,   or  to  combine  several  different  estimators  for  improved  efficiency. 
Analogous  methods  are  also  available  for  the  truncated  regression  model,   and  can  be 
obtained  by  replacing     £     with     k     in  the  following  discussion. 

The  efficient  estimator  from  some  class  can  be  chosen  by  minimizing  an  estimator  of 

the  asymptotic  variance.      To  describe  this  type  of  estimator,   let     M     denote  some  family 

of  functions     m(e).      For  each     m(*),      let     13        denote  the  censored   (or  truncated) 

m 

regression  estimator  described  in  Section  3   (or  Section  4)  for     w(X)  =  1.      Also,   let     1     . 

mi 

=  KX'.S      >  -I    ),      and     v    .   =  X'.  (plim/3    ).      Then  the  block  of  the  asymptotic  variance 
r  m  m  mi  i  m 

estimator  corresponding  to  the  slope  coefficients  is 

(5.5)  V^      =   (d"^^^/a    )Var(x.|v    .>-£    )"\     a      =  Y.^  S    ./n, 

2m  m     m     m  i     mi      m  m       ^i=l  mi 

Var(x.|v    .>-£    )  =  ex     Y.    A    .x.x'. /n  -  x    x'  ,     x      =  a     y.    ,1    .x./n. 
1     mi      m  m  ^i=l  mi   i    i  mm         m         m  ^^1=1  mi   i 


A  member  of  the  class     {p      :   m  e  tM}     can  be  selected  by  choosing     m     to  minimize  some 

measure  of  the  size  of     V„    .      If     Var(x.|v    .>-£    )     is  invariant  to     m     then  an  equivalent 

2m  1     mi      m 

--2-2    ~ 

choice  could  be  obtained  by  minimizing  the  scalar     d      cr    /a    . 

m     m     m 

Powell   (1986a)   suggested  such  a  procedure  for  choosing  among  different  censored 
regression  quantile  estimators.      Also,   McDonald  and  Newey  (1988)  gave  regularity 
conditions  for  the  choice  of     m     to  have  no  effect  on  the  asymptotic  variance  of     p       in 
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the  case  of  an  uncensored  regression  model  and  a  parametric  class  of  moment  restrictions. 
Intuitively,   estimation  of  the  best     m     has  no  effect  on  the  asymptotic  variance  because 
the  limit  of     p  does  not  depend  on     m.      We  could  specify  regularity  conditions  for 

this  result,   but  for  brevity  we  do  not. 

An  alternative  way  of  proceeding  is  to  combine  several  estimators  from  a  class  using 
efficient  minimum  distance  estimation.      This  has  the  advantage  of  yielding  an  estimator 
that  is  asymptotically  no  less  efficient  than  any  of  the  individual  ones  as  well  as  a 
simple  test  of  independence  between  regressors  and  disturbance.      Its  disadvantage  is  that 
the  small   sample  distribution  may  be  adversely  affected  by  estimation  of  the  optimal 
weighting  matrix,   as  is  well  known  to  be  a  problem  in  GMM  estimation. 

For  this  approach  we  let     M     he  a  finite  set.      For  notational  simplicity  we  identify 
each     m     with  an  element  of  the  index  set     {j  =  1,    ...,   J>,      and  we  will  drop  the     2 

subscript  on  the  slope  coefficient  estimators.     Thus,     /3 .     will  denote  the  slope 

th 

coefficients  for  the     j         moment  condition.      Similarly,   let     d .     denote  the  derivative 

estimator,     I.     the  constant  point,      1..   =  1(X'.  8  .>-■£.),     e ..   =  y.-X'. fi.,     and     c .,    = 
J  Ji  1   J      J  Ji         1      1   J  Jk 

y.    A. A,  .m.(E..)m,  (e,  A/Y.    A -A,  ■■     Then  an  estimator  of  the  asymptotic  covariance 
^1=1  ji  ki    J     ji      k     ki    ^1=1  ji  ki 

between     S .     and     S,      is 
J  k 

Q.,    =  dT^d"^£.,  [0,I](y;.",i..X.X'./n)"V.",l..l,  .X.X'./n(y."  T,  .X.X'./n)"-^[0,I]', 
jk  J     k    jk  ^1=1  ji    1    1  ^^1=1  ji  ki   1    1       ^1=1  ki   1    1 

where     [0,1]     is  a  selection  matrix  that  picks  out  the  elements  corresponding  to  slope 
coefficients.     Then  the  partitioned  matrix     Q  =  [^.,1     is  an  estimator  of  the  joint 

asymptotic  covariance  matrix  of     tt  =  (p'   ...,p')'.     Let     H  =  [1,1 1]'      be  a 

J(K-l)x(K-l)     partitioned  matrix  made  up  of     K-1     dimensional   identity  matrices.     Then 
the  optimal  minimum  distance  estimator,   from     minimizing     (tt  -  UfS)' Q     (tt  -  H/3),      and  the 
associated  asymptotic  variance  estimator  and  overidentification  test  statistic  are 

(5.6)  p^  =  (H'Q"^H)"^H'Q"^rt,     V^  =  (H'Q"^H)~\     T^  =  nin  -  Hp^)'Q"^rt  -  Hp^). 

Under  the  regularity  conditions  previously  stated  it  will  be  the  case  that 
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(5.7)  VEC^^  -  Pq)  ^  N(0,V^),     V^  ^  V^,     T^  ^  a:2((J-l)(K-l)). 


Here     T^     provides  a  statistic  for  testing  independence  that  depends  on  how  close  the 

different  estimators     ^ .     are  to  each  other,   analogous  to  the  tests  of  Koenker  and 

Bassett   (1982).      Also,   the  estimator     /3        is  an  optimal   (asymptotic  variance  minimizing) 

linear  combination  of  the  estimators     ^  ,    ...,   |3  .      For  brevity  we  do  not  state  these 

results  as  a  theorem. 

The  estimator  selection  and  efficient  minimum  distance  approaches  require  multiple 

estimators  of  the  slope  coefficients.      These  may  be  constructed  as  described  in  Sections 

3  and  4,   but  this  requires  optimization  of  multiple  objective  functions.     A  more 

convenient  computational  approach  is  to  use  a  linearization  to  construct  estimators  from 

a  single  initial  ■/n-consistent  one.      To  describe  this  method  let     /3     be  an  initial 

estimator  of  the  slope  coefficients  and  let     tt        be  obtained  as 
^  m 


y      =  argmin  Y.    ,q(y.-max{x'. fi  +  ^,   -£    }). 
m  ^-^1=1       1  1  m 


This  is  only  minimization  over  a  scalar,   and  can  easily  be  carried  out,   e.g.   by  grid 

search.      For     v.   =  x'. 8  +  j         let     d        be  the  same  as  given  in  equation  (5.4)  with     v. 
1  1  m  m  on  ^ 

replacing     v.      and 

]3      =  3  +  d"^[0,I](y."  ,l(V.>-£    )X.X'.fW.^Uv.>-l    )X.m(y.-v.). 
m  m  ^1=1      1      mil      ^i=l      i      m     i        i     i 


By  the  usual  one-step  arguments  this  estimator  will  be  asymptotically  equivalent  to     |3 
that  is  obtained  from  the  full,   global  minimization  of  the  objective  function. 

These  estimators  lead  to  computationally  simpler  versions  of  selecting  an  efficient 
estimator  in  some  class  and  optimal  minimum  distance  estimation.      One  could  use     (3     and 
If        in  computing  each  of  the  asymptotic  variance  estimators  for  selecting  an  efficient 

estimator,    and  then  use  the  linearized  estimator     S        at  the  efficient  choice  of     m. 

m 

Also,   one  could  use  the  linearized  estimators     (/3  p  )     in  place  of  the  full  global 

minimizers  in  constructing  the  optimal  minimum  distance  estimator.     This  replacement 
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will  produce  asymptotically  equivalent  estimators  and  test  statistics. 

The  choice  of  the  class     M     will  affect  the  efficiency  of  the  estimators.      One  might 

choose  them  so  that  the  location  parameter     u.        varies,   as  in  the  censored  regression 

m  ^ 

quantiles  of  Powell   (1986a).      This  choice  leads  to  a  tradeoff  between  the  location  in  the 

tail  of  the  disturbance  distribution  and  the  proportion  of  observations  used  by  the 

estimator.      As     u        increases,   the  variation  in     m(e)     will  be  located  more  in  the  tail, 
m 

but     l(x'.  S^  +  11      >-•£)  =  !     more  often  so  that  more  observations  are  included. 
1    0  m 

An  interesting  open  question  concerns  the  efficiency  of  the  optimal  minimum  distance 
estimator  relative  to  the  efficiency  bound  derived  by  Cosslett  (1987).      In  some 
semiparametric  models  it  is  possible  to  combine  moment  restrictions  so  that  an  optimal 
minimum  distance  estimator  approximately  attains  the  semiparametric  efficiency  bound,   as 
discussed  in  Newey  {1989b).      Here,   it  is  difficult  to  verify  this  result  because  of  the 
complicated  nature  of  the  efficiency  bound  and  the  moment  conditions. 
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APPENDIX 

Throughout  the  Appendix     C     will  denote  a  generic  constant  that  may  be  different  in 
different  uses.     The  following  Assumption  is  useful   in  the  proofs  of  the  semiparametric 
information  bounds. 

Assumption  A.l:      Parametric  submodels  correspond  to  latent  densities     flE.Xlrj)   = 

f(E|X,T))f(X|-r7)     such  that     f(E  |  X.t)  )f  (X  |t)   )     is  smooth  in     (t)',t)')'      and  for  almost  all 

2 

X,     f(E|X,7))      is  a  density  for     c     that  is  smooth  in     7}     and  satisfies     JmlE)   f(E|X,'n)dE 

is  bounded  in  a  neighborhood  of     t)   .     Also,   in  the  truncated  case,   for     e  =  (/3',t}')', 
Prob^(y  >  0)   >  0     for  all     9. 

The  proof  of  Theorem  3.1  will  make  use  of  the  following  Lemma: 
Lemma  A.l:     If  Assumptions  3.1  and  A.l  are  satisfied  then     l(vs-i)S     G  J. 

Proof:      By  Lemma  A. 5  of  Newey  and  Powell   (1990)   it  follows  that  the  tangent  set  of  the 
latent  model  is     5"     =  {8  =  dlcX)   :   E[5]  =  0,     E[m(E)5|X]  =  0,     E[II5II    ]  <  oo}.     Also,   by 
Lemma  A. 2  of  Newey  and  Powell   (1990),   for     i?(6)  =  E[5|y,x],      it  follows  that  the  tangent 
set  of  the  observed  data  is  the  mean  square  closure  of  the  set     {6'(5)   :   5   6  3"}.      Consider 
any     I  >  I.     Then  for     v  ^  -I,     m{e)     is  nonconstant  on     (-oo,-v],   because  -v  ^  £  >£..     Let 
L  =  {e  £  -v}.     Then  for     v  >  -£,     Var(m(E)  |L,X)   >  0,      and  hence     A(X)  = 
E[l(L)(l,m(£))'(l,m(e))|X]     is  nonsingular.      For     M  >  0     let     D(X)  =  0     if     P(L|X)  =  0 
and  otherwise 

D(X)  =  E[l(E>-v)(l,m(E))s(E,X)|X]A(X)"\ 

L,  =  1(IID(X)II    £  M)-l(v<-J), 
M 

t  =  -X-1j^-{1(e>-v)s(e,X)  -  D(X)l(e<-v)(I,m(E))'}. 
Note  that 
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E[t(l,m(E))|X]  =  -X-lj^-E[{l(c>-v)s(c,X)  -  D(X]l(E<-v)(l,m(e))' }(l,m(c))  |  X]  =  0, 


where  the  last  equality  follows  by  the  definition  of     D(X).     Also,      t     has  second  moments 
by  Assumption  3.1,   so  that     t  =  tJlt)  €  3".      Furthermore, 

t  =  -X-lj^{l(y>0)s(G,X)   -  l(y=0)D(x)E[(l,m(e))'  |y,X]} 

=  -X-L ,0(y>0)s(E,X)   -   l(y=0)E[l(E>-v)s(c,X)|X]/P(L|X)> 

M 

=  -X-lj^{l(y>0)s(E,X)   +  l(y=0)E[s(E,X)|L,X]}  =  IS. 

It  follows  that     E[lll(v<-£)S   -tll^]   =  E[IIS    ll^l(IID(X)ll>M)l(v<-£)]  ^  0     as     M 

-^  CO.     Therefore,      l(v<-£)S     e  J.      Finally,   since     Prob(v=-£)  =  0,      l(£<v<Z)  -^  0     almost 

P 

-  2 

surely  as     £  — >  I,     so  by  the  dominated  convergence  theorem,     E[lll{v£-£)S„-l(v:£-£)S„ll    ]  = 
J  >  J  b  (3  j3 

E[l(£<v<-£)IIS    11^]  — >  0     as     I  ^  L     Q.E.D. 

Proof  of  Theorem  3.1:      Let 

(A.l)  6(c,X)  =  -X-1(v>-£)-<s(e,X)   +  w  (X)m(E)}. 


By  Assumption  3.1  this  random  variable  has  finite  mean-square.     By  Lemma  7.2  of  Ibragimov 

2 
and  Hasminskii   (1981)  and     E[m(E+a)    |X]     bounded  on  a  neighborhood  of  a  =  0,     E[m(E+a)|X] 

=  JmCElf  (E-ai  X)dE  is  differentiable  in     a     at     a  =  0     and     d(X)   =  3E[m(E+a)  |X]/9a  = 

-E[m(E)s(E,X)|X].     Then  by     E[s(£,X)|X]  =  0, 

(A. 2)  E[6(£,X)|X]  =  -X1(v>-£){E[s(e,X)|X]  +  w  {X)E[m(E)  |X]}  =  0, 

E[m(E)6(E,X)lX]  =  -Xl(v>-£){E[m(e)s(e,X)lX]  +  w  {X)E[m(e)^|X]}  =  0. 


Therefore,      ^(5)   =  l(v>-£)S     -  S  e  3".     Then,   by  linearity  of     J     and  Lemma  A.l,     S  -S  =  ^(6) 
+  l(v<-£)S     e  3",      i.e.     S  =  S     -  t     for     t  €  J.      Furthermore,   by  equation  (3.1),   for  any 
5(£,X)     satisfying     E[5(e,X)|X]  =  0,     E[5(e,X)m(E)  |  X]  =  0, 
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E[S'e(5)]  =  E[S'E[6(c,X)|y,X]]  =  E[E[S' 5(e,X)  |  y,X]] 


=  E[w  (X)X'l(v>-£)m(c)6(e,X)]   =  E[w  (X)X' l(v>-£)E[m(c)5(E,X)  |  X]]   =  0. 


Then,   since     3"     is  the  mean-square  of  objects  of  the  form     G'(6),      it  follows  that     S     is 
orthogonal  to  the  tangent  set.      Therefore,     S     is  the  residual  from  the  mean-square 
projection  of     S        on  the  tangent  set,   and  hence  is  the  efficient  score.      Q.E.D. 

The  following  Lemma  is  useful  for  the  proof  of  Theorems  3.2  and  4.2. 

Lemma  A.2:     Consider  an  estimator     p  =  argminS._M(z.,fi).     If     i)     ^  — ^  jS  ;     ii)     there 
is     biz)     with      \q(z,l3)-q(z,(3)\    <  b(z)\\^-(3\\     and     E[b(z)  ]  <  co;     Hi)     there  is     U(z) 
with     E[U(z)]  =  0     and     E[U(z)U(z)'  ]     nonsingular  such  that  with  probability  one, 
[q(z,IS)-q(z.i3^)-U(z)'(!3-l3^)]/\\l3-l3^\\   -^  0     as     /3  -^  /3^;      iv)     For     q(l3)  =  E[q(z.,(3)] 
there  is     Q     nonsingular  with     qCfB)  =  qOig)  +  ((S-IBq)'  Q((i-fSg)/2  +  001/3-/3^11^);     Then 
VR(^-Pq)  -^  N(0,  Q~^E[U(z)U(z)' lQ~h. 


Proof  of  Lemma  A.2:     By  ii)  and  the  triangle  inequality, 


r(z,/3)   =    |[q(z,p)-q(z,/3Q)-U(z)'(/3-/3Q)]/ll/3-/3Qll|    <   (b(z)+IIU(z)ll ). 

2 

Then  by  iii)  and  the  dominated  convergence  theorem,     E[r(z,/3)   ]  — >  0     as     /3  — >  /3   .      The 

conclusion  then  follows  by  Example  3.2.22  of  Van  der  Vaart  and  Wellner  (1996). 

Proof  of  Theorem  3.2:      Consider     r  =  min{-X'P,£}  :£  I     and     r  =  min{-v,£}  ^  I.     Note  that 

for     c  :£  £,      q(e)     is  linear  in     c.     Then  for     y     £  0,     y     -  max{X'/3,--£}  =  y  +r  ^  £,      so 

that 


(A. 3)  q(y-max{X'/3,-£})-q(y-max{v,-£})   -  [q(y  -max{X'/3,-£})-q(y  -max{v,-£})] 

=  l(y  <0){  q(?)-q(r)   -  [q(y  +?)-q(y  +r)]   } 
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=  l(y  <0){   q(r-r  -  [y  +r-(y  +r)])   >  =  l(y  <0)q(0). 


Then,   since  none  of     q(y  -max{v,-£y),      q(y-max{v,-£}),      l(y  <0)q(0),     and     q(e)     depend 
on     IB,      and  adding  terms  to  the  objective  function  that  do  not  depend  on     /3     does  not 
change  the  minimum,   it  follows  that  for     z  =  (y  ,X), 

(A.4)  p  =  argmin^^^QO),     Q(/3)  =  Ej"^q(z.,,p)Ai, 

q(z,/3)   =  w(X)[q(y*-max{X'p,-£})-q(c)]. 

Note  that     q(z,p)      is  continuous  in     (i.     Also,     q(e)     and     max{v,--£}     are  Lipschitz  in     e 
and     V     respectively,   so  that 

(A. 5)  lq(z,/3)|    £  Cw(X)|v-max{X'/3,-£}|    <  Cw(X)IIXII(IIPII  +  IIPqII)  ^  Cw(X)IIXIl, 

|q(z,p)-q(z,/3)|<   Cw(X)  11X11  llp-pil . 

It  follows  by  a  standard  uniform  law  of  large  numbers  that     QO)  =  E[q(z,p)]     exists  and 
is  continuous  in     p,     and  that,     sup„  „|Q(/3)-Q0)|   -^  0.     Since     p  =  argmin-   ^Q(/3), 

p€i3  p€jD 

consistency  will  follow  from  the  standard  argument  if     Q(|3)     is  uniquely  minimized  at 

*  ~ 
*    ^  *  y   —OL 

/3    .      By  the  definition  of     q('),     q(y  -a)-q(y  -a)  =  S     _  m(u)du.     By     m(e)     bounded, 

*   _  *  „ 

I  [q(y  -a)-q(y  -a)]/(a-a)|    £  C,      and  by  the  fundamental  theorem  of  calculus  and     m(e) 

^  *  ~  *  ~  *   ~ 

continuous  almost  everywhere,   as     a  — >  a,      [q(y  -a)-q(y  -a)]/(a-a)  — >  m(y  -a)     almost 

surely.     Therefore,   by  the  dominated  convergence  theorem,     E[q(y  -a)-q(y  -a)|X]/(a-a)  = 

J'[J'^*~"m(u)du/(a-a)]f(y    |X)dy     -^  E[m(y  -a)|X].      It  follows  that     q(a,X)  = 

E[q(y  -a)-q(e)|X]     is  differentiable  in     a     with  derivative     E[m(y  -a)|X].     Then,   by 

Assumption  3.2,      q(a,X)     is  increasing  in     a     for     a  £  v,      decreasing  in     a     for     a  £ 

V,     having  a  global  minimum  at     a  =  v.      Furthermore,      q(max{a,--£},X)     also  has  a  global 

minimum  at     a  =  v,     because     q(a,X)     is  increasing  in     a     for     a.  ^  v.     Therefore, 
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(A.6)  E[q(z,/3)|X]   =  w(X}q(max{X' /3,-£},X)   >  w(X)q(max{v,-n,X)   =  E[q{z,l3^)\X]. 


Now,   for  any     /3  '^  P^     consider  the  event     d  =  {w(X)>0,   v>-£,      d(X)  >  0,      and     X'|3  ?^  v}. 
Assumption  3.2  implies  that  for     5  =  /3-/3q  ^  0,     E[w(X)d(X)l(v>-£)(v-X' p)^]  =  5'Q5  >  0, 

so  that     Prob(i4)   >  0.      Also,    it  follows  from  the  proof  of  Theorem  3.1  that     E[m(y  -a)|X] 

2-  2 

is  differentiable  at     a  =  v     with  derivative     -d(X),      so  that     d{X)   -  d  q(a,X)/5a    | 

a=v 

When     d(X)  >  0,     q{a,X)     will  have  a  unique  local  minimum  at     a  =  v,     which  is  a  unique 
global  minimum  by  the  previous  argument.      When     d     occurs,      max{X'/3,-£}  ;*  v  =  max{v,--£}, 
so  that 


E[q(z,/3)|X]  =  w(X)q(max{X'|3,-£},X)  >  w(X)q(max{v,-£},X)   =  E[q[z,(3^]\X]. 

Then  by     Prob(^)   >  0,      Q{(3)  =  E[q(z,/3)]  =  P(^)E[q(z,/3)  U]  +  P(^^)E[q(z,/3)  U^]  = 

P(^)E[E[q(z,/3)|X]M]   +  P(^^)E[E[q(z,p)|X]M^]   >   P(^)E[E[q(z,/3Q)  |  X]  M]   + 

P(s4^)E[E[q(z,/3    )|X]M^]  =  Q(/3    ).      Thus,   the     QOq)     has  a  unique  minimum  at     /3         giving 

We  have  now  shown  that  condition  i)   of  Lemma  A.l  holds.      Condition  ii)  follows  by 

* 
equation  (A. 5).     Now,   for  condition  iii),   let     d  -  {(y  ,X):   v  ^  -I     and     q{e)     is 

* 
continuously  differentiable  at     y  -max{v,-£}},     and  note     Prob(4)  =  1     by  Assumptions 

3.1  and  3.2.      On     d,     max{X'/3,-£}     is  linear  in     /3     in  a  neighborhood  of     /3        with 

derivative     l(v>-£)X     at     /3    .      Also,   on     d,     q(y-a)     is  continuously  differentiable 

at     a  =  max{v,--g>     with  derivative     -m(y-a).     Then  by  the  chain  rule,   on     d, 

q(y-max{X' (3,-£})     is  differentiable  with  derivative     -m(y-max{v,-£})l{v>-£)X  = 

-l(v>-£)m(E)X  =  U(z).      Since     E[U(z)]   =  E[E[U(z)|X]]  =  -E[l(v>-£)E[m(E)  |  X]X]   =  0 

and     E[U(z)U(z)']  -  E,     condition  iii)  of  Lemma  A.l  is  satisfied. 

For  condition  iv),   note  that  as  shown  above     q(a,X)     is  differentiable  with 

* 
derivative     -E[m(y  -a)|X]  =  -E[m(E+v-a)  |X].      Also,   it  follows  as  in  the  proof  of  Theorem 

3.1  that     E[m(c+v-a)  |X]     is  differentiable  in     a     with 
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aE[m(c+v-a)|X]/aa  =  J^mlc+vjf   (c+a|X)dc  =  Jm(c+v-a)f   (e|X)dc 

E  C 

=  E[m(c+v-a)s(c|X)|X], 

where     s(g|X)  =  f   (c  |X)/f  (e  |  X).      Since     m(e)     is  continuous  almost  everywhere,   it  follows 
that     SE[m{c+v-a)  I  X]/9(x     is  continuous,   so  that     q(a,X)     is  twice  continuously 
differentiable,   and 


q      (a,X)   <  Ca(X),      a(X)   =  E[  |  s(e  |  X)  |  |  X]. 


For  notational  convenience,   let     b  =  -I,     v  =  X' (3,     r  =  max{v,b},      r  =  max{v,b},      and 

*  * 

suppress  the  on     y  .     Then  a  second  order  mean-value  expansion  gives 


(A.7)  E[q{z,/3)-q(z,/3Q)|X]   =  w(X)[q(r,X)-q(r,X)] 


w(X)[q    (r,X)(?-r)   +  q      (r,X)(?-r)^/2], 
a  aa 


where     r     lies  between     r     and     r.     Now,  note  that 

r-r  =  l(v>b)l(v>b)(v-v)  +  l(v>b)l(v<b)(v-b)  +  l(v<b)l(v>b)(b-v) 
=  l(v>b)(v-v)  +  l(v>b)l(v<b)(v-b)  +  l(v<b)l(v>b)(b-v). 


Also,      l(v>b)q   (r,X)  =  l(v>b)q   (v,X)   =  -l(v>b)E[m(E)  |X]  =  0,      so  that  for     v     between 
a  a 

b     and     v, 


Iq   (r,X)(r-r)|    =    |q   (b,X)l(v>b)l(v<b)(v-b)  |    £    |q     (v,X)  |  l(v>b)l(v<b)  |  v-b  M  b-v  | 
=  Ca(X)l(v>b)l(v£b)|v-v|^  <  Ca(X)l(v>b)l{v<b)llXll^ll/3-/3    11^. 


By     Prob(v=b)  =  0,      l(v>b)l(v<b)  — »  0     w.p.l     as     /3  — >  /3         so  by  the  dominated 

2 

convergence  theorem  and  existence  of     E[a(X)  11X11    ]     (as  implied  by  Assumption  3.1), 

E[a(X)l(v>b)l(v£b)  11X11^1  -^  0.      Therefore,   it  follows  that  as     p  — >  (3 
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(A.8)  E[w(X)|q^(r,X)(?-r)|]   <  CE[a(X)l(v>b)l(v<b)IIXII^]ll/3-/3Qll^  =  0(11/3-/3^11^) 


By  similar  reasoning, 


(A.9)  E[w(X)|q      (r,X)l(v>b)l(v<b)(v-b)^|]   =   o(ll/3-/3_^ll^), 

(A.IO)  E[w(X)lq      (r,X)l(v£b)l(v>b)(b-v)^|]   =  0(11/3-/3^11^). 

aa  0 


Also,   as     /3  — >  /3    ,      q      (r,X)  — >  d(X),      so  by  the  dominated  convergence  theorem, 

(A.ll)  E[w(X)q      (r,X)l(v>b)(v-v)^]  =   O-^^)' E[w(X)q      (?,X)l(v>b)XX' Kp-p^) 

aa  0  aa  0 

=    (/3-/3q)'Q(/3-3q)    +   o(11P-/3qII^). 


Taking  expectations  of  equation  (A. 7)   and  applying  the  triangle  inequality  to  equations 

(A.8)  -   (A.ll)  then  gives  condition  iv).     The  first  conclusion  then  follows  from  the 

* 
conclusion  to  Lemma  A.l.      The  second  follows  upon  noting  that  if     w(X)   =  w  (X), 

Q  =  E[{d(X)/E[m(c)^|X]}d(X)l(v>-£)XX']  =  E[SS']     and     E  = 

E[{d(X)/E[m(G)^lX]}^E[m(c)^lX]l(v>-£)XX']  =  E[SS' ].      Q.E.D. 

Proof  of  Theorem  3.3:     Note  that  if     m(E)     is  differentiable  then  by  Assumptions  3.1  and 
3.2,      l(X'/3  >  -£)w(X)^m(y-X'/3)^XX'      and     l(X'/3  >  -£)w(X)m   (y-X'/3)XX'      are  continuous  at 
(i       with  probability  one  and  dominated  by  functions  with  finite  expectation.      Consistency 
of     Q     and     Z     then  follow  by  Lemma  4.3  of  Newey  and  McFadden   (1994).      In  the  other  case, 
let     Q(z,/3)  =  w(X)XX'[q(y-max{X'/3,-£})-q(y-max{v,-£})],     so  that  for     e    =  (1,0,. ..,0)' 
the  first  unit  vector, 

Q5  =  Ij"^[Q(Zj,|3+ej6)+Q(z.,/3-e^5)-2Q(z.,p)]/(5^n). 

Note  that  by  equation   (A. 5),      IIQ(z,p)-Q(z,/3)ll   <  Cw(X)llXll^llp-pil      and     E[w(X)^IIXII^]   <  00. 
Then  by  2.7.11  of  Van  der  Vart  and  Wellner   (1996),   for  any     A  — >  0     and     p,   /3     in  a  small 
enough  neighborhood  of     p         for     Q(/3)  =  E[Q(z.,/3)], 
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suP|,^_pil^^lln"^^^I.^^{Q(z.J)-Q(z.,/3)-[Q(^)-Q0)]>ll   ^  0. 

1/2  2 
It  follows  by     n       5     — >  oo     that 

(A. 12)  Q^   =   [Q(p+e,5)+Q{/3-e,5)-2Q(/§)]/5^  =  Q  +  o   (1). 

o  i  i  p 

Furthermore,   it  follows  by  equation  (A. 3)  and  the  associated  discussion  that  we  can  take 

Q(/3)   =  E[XX'q(z,/3)]  =  E[w(X)XX' q(max{X' /3,-£},X)]. 


for     q(z,/3)     as  defined  in  eq.    (A. 4).     Let     Q.,  (p)  =  E[X.X  q(z,/3)].      It  follows  by  the 

J"^        J  "^ 

expansion  in  eq.  (A. 7)  that  for  r  and  r  as  defined  there, 

Q.,  0)-Q.,  (/3  )  =  E[w(X)X.X  {q  (r,X)(r-r)  +  q   (F,X)(?-r)^/2}]. 

JK,  JKvJ  JKCX  OCOC 

Noting  that     E[w(X)a(X)IIXll'^]   <  CE[w(X)a(X)^IIXII^]  +  CE[w(X)IIXII^]   <  CE[s(e  |  X)^IIXII^]  +  C     is 

finite  by  Assumption  3.1,    it  follows  similarly  to  the  proof  of  iv)  for  Lemma  A.l  that  for 

M  .,    =  E[w(X)d(X)l(v>-£)X  .X,  XX'  ]/2, 
Jk  J    k 

2  2 

Therefore,   for     ^r  =  (3-3    ,      and  noting  that      ll^'+e  511      =0   (5    ), 

[Q  .j^(/3+e^5)+Q  .j^(p-e^5)-2Q  .j^(p)]/5^ 

=  [(^+e,5)'M.,  (^+e,6)   +  (j-e,5)'M.,  (^-e,5)  -  2^'M.,  ^]/5^  +  o   (1) 
1  jk         1  1  jk  "      1  jk"  p 

=  2e'M.,  e,   +  o   (1)   =  E[w(X)d(X)l(v>-£)X.X,  ]  +  o   (1)  =  Q.,    +  o   (1). 
1    jk  1         p  J    k  p  ^jk         p 

The  conclusion  then  follows  by  eq.    (A. 12).      Q.E.D. 

^  ^  ^  ^ 

For  truncated  case  and     T  =  l(y  >0),     note  that  E[']   =  E  ['[T]  =  E  [l(T)(-)]/P  (T) 


and     E[-  |x]  =  E  [•  |T,X]  =  E  [1{T)(  • )  |X]/E  [1(T)|X].     The  following  Lemma  will  be  used  i 


in 
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the  proof  of  Theorem  4.1: 

Lemma  A.3:     If  Assumptions  3A  and  A.l  are  satisfied  then     l(v^-k)S     e  3". 

p 

Proof:      Let     T^  =  {y     ^  0)     and     s{c,X)  =  1(T){s(e,X)-E  [s(c,X)  |T,X]}.      Note  that 

E[S(e,X)|X]   =  E  [S(e,X)|T,X]   =  0.      Also,     E  [11X11    s(e,X)    1  =  P  (T)E[IIXII    s(e,X)    ] 

=  P*(T)E[IIXll^E[i(e,X)^|X]]  =  P*(T)E[IIXII^Var(s(£,X)  |  X)]   <  P*{T)E[I1XII^E[s{e,X)^|  X]]   = 

P*{T)E[IIXII^s(e,X)^]   £  E  [IIXII^s(e,X)^]  <  oo.      Consider     k  >  k     and  the  case  where     k  <  -v. 

*        c  2 

By  the  definition  of     k,     mic)     will  be  nonzero  on     £  <  -v.      Therefore,      E  [1(T  )m{c)    |  X] 

>  0.      Let     A(X)  =  E  [(l(T),l(T^)m(E))'(l,m(E))|X].      Note  that  by     E  [m(E)|X]  =  0,      we 

have     E  [l(T'^)m(E)  |  X]  =  -E  [l(T)m(£)|X],      and  hence 


Det(A(X))   =  E  [1(T  )m{£)    |X]E  [1(T)|X]   -  E  [1(T  )m(£)  |  X]E  [l(T)m(£)|X] 
=  E*[l(T^)m(£)^|X]E*[l(T)|X]   +  E*[l(T'')m(£)  |  X]^  >  0,      (v  <  -k). 


-     *  ~  -1 

Let     D(X)   =   l{v^-k)E  [s(£,X)(l,m(E))  |  X]A(X)     ,     M     be  some  positive  constant,      and 

5(£,X)   =  -X-l(v<-^)-l(IID(X)ll£M)-[s(£,X)  -  D(X)(l(T),l(T^)m(£))']. 


#  Hit 

By  the  definition  of     D(X),     it  follows  that     E  [5(l,m(£))lX]  =  0.     Also,     E  [5' 5]  <  oo 

by     E  [m(E)^IIXIl^]   <  oo     and     E  [IIXII^s(£,X)^]  <  oo.      By  Lemma  A. 5  of  Newey  and  Powell 

* 
(1990),      6     is  in  the  tangent  set     J       for  the  latent  model.      For     5  e  J,      let     ^(5) 

* 
be  the  transformation  from  the  latent  to  the  observed  data  given  by     ^{d)  =  5-E  [5|T]. 

By  Lemma  B.2  of  Newey  (1991),   the  tangent  set  for  the  observed  data  is  the  mean  square 

*  c 

closure  of     {S'(5)   :   6  e  J  }.     Note  also  that  1(T  )     is  zero  in  the  observed  data,   and 

that     1(T)  -  E  [1(T)|T]  =  0     in  the  observed  data,   so  that     e((l(T),l(T'^)m(£)))  =  0. 


Also,     e(s(E,x))  =  s{e,X).     Therefore,     by     S     =  -X-s(£,X), 


i?(6)   =  l(v<-fc)-l(IID(X)llsM)-S 


The  conclusion  then  follows  as  in  the  proof  of  Lemma  A.l.      Q.E.D. 
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Proof  of  Theorem  4.1:      Let     5(c,X)     be  as  defined  in  equation  (A.l).     As  shown  in  the 
proof  of  Theorem  3.1,      E[6(E,X)(l,m(E))  |  X]  =  0     and     E[ll5(c,X)ll^]  <  oo,      so  that     6(6)  e 
5".      Note  that  by     m(c)     zero  for     c  <  A:,      l(v>-/t]m(c)   =   UT)\{v>-k)m{c),     so  that 
e(X-l(v>-£)-w  (X)m(E))  =  S.      Therefore,     6(6)  =  l(v>-A:)S     -  S  e  J.      By  Lemma  A. 3   it  then 
follows  that     S     -  S  e  5",      i.e.     S  =  S     -  t,     t  e  3".     Also,   note  that     E  [1(T)S|X]   = 
X-w  (X)E  [l(T)l(v>-/l)m(E)|X]  =  X-w  (X)E  [l(v>-ft)m(E)  |X]  =  Xl(v>-fc)E  [m(£)|X]  =  0. 
Therefore,   for  any     6     satisfying     E[5(e,X)m(£)  |  X]  =  0, 

^  ^  ^ 

E[S'E'(6)]  =  E  [1(T)S'{5(e,x)-E  [6(e,X)  |T]}]/P  (T) 

^  ^  #  ^ 

=  E  [1(T)S'5(£,x)]/P   (T)   -  E[1(T)S'E  [5(e,X)  |  T]]/P   (T) 

=  E  [E  [S'5(e,x)|X]]/P  (T)  =  E[Xl(v>-^)E  [m(E)6(E,X)  |X]]/P  (T)  =  0. 


Then,   since     J     is  the  mean-square  of  objects  of  the  form     6(6),      it  follows  that     S     is 
orthogonal  to  the  tangent  set.      Q.E.D. 

Proof  of  Theorem  4.2:      Let     q(z,/3)   =  w(X)[q(y-max{X'/3,-^l)-q(E)].      It  follows  as  in  the 
proof  of  Theorem  3.1  that  for  all     y     (positive  or  negative), 

|q(z,/3)|    £  Cw(X)IIXII,      |q(z,p)-q(z,/3)|<  Cw(X)IIXII  11^-/311. 

It  follows  that     E  [l(T^)|q(z,p)|]   <  CE  [l(T^)w(X)IIXll]     is  finite.      Also,   since     q(£) 
is  constant  for     c  :^  k     and     -maxiX'  p,-k})  £  k,     we  have     1(T  )q(y-max{X'/3,-£})  = 
l(T'^)q(^).      Therefore, 

E[q(z,p)]   =  E  [l(T)q(z,/3)]/P   (T)   =  E*[q(z,/3)]/P*(T)   -  E*[l(T^)q(z.p)]/P*(T) 

=   E*[q(z,/3)]/P*(T)   -  E*[l(T^)w(X){q(^)   -   q(E)}]/P*(T). 


That  is,     E[q(z,/3)]   =  C^E  [q(z,/3)]  +  C^     for  constants     C     >  0     and     C         It  follows  that 

* 
the  maximum  of     E[q(z,/3)]     will  coincide  with  the  maximum  of     E  [q(z,/3)],      which  has  a 
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unique  maximum  at     (3   ,  as  shown  in  the  proof  of  Theorem  3.1.      Furthermore,   since     q(c) 

is  constant  below     k,      it  is  also  linear  below     k,     and  so     E  [q(z,p)]     inherits  all 

the  properties     E[q(z,/3)]  has  in  the  censored  case.      In  particular,     E[q(z,/3)]  = 
E[q(z,/3    )]   +   0-/3    )'Q(/3-/3    )/2  +  0(11/3-/3^11    ),      as  in  the  proof  of  Theorem  3.1.      Thus, 

conditions  i),    ii),   and  iv)  of  Lemma  A. 2.      Condition  iii)  of  Lemma  A. 2  also  follows  as 

in  the  proof  of  Theorem  3.1,   so  that  the  conclusion  follows  from  the  conclusion  of 
Lemma  A. 2.      Q.E.D. 

Proof  of  Theorem  4.3:      Follows  by  extending  the  results  for  the  censored  case  to  the 
truncated  case  analogously  to  the  proof  of  Theorem  4.2.     Q.E.D. 

Proof  of  Theorem  5.1:      By  equation  5.1  the  conditional  moment  restriction  is  satisfied  in 

the  latent  data  at     p  =  p   ,      so  that     Vn{^-J3   ]  -^  N(0,Q~  ZQ~  )     by  the  conclusion  of 

2  2 

Theorem  3.2.      Also,   by  independence,     d      =  d(X)     and     E[m{e-u    )    |X]  =  cr    ,      so  that     Q 

m  m  m 

d    E[w(X)l(v>-£)XX' ]     and     E  =  o-^E[w(X)^l(v>-£)XX' ],   giving  the  first  conclusion.      For  the 
m  m 

second  conclusion,   note  that  for     Y  =  w(X)l{v>-£)X     and     U  =  l(v>-£)X, 

(E[w(X)l(v>-£)XX'  ])~^E[w(X)^l(v>-£)XX'  l(E[w(X)l(v>-£)XX'  ])"^-(E[l(v>-£)XX'  ])"^ 

=   (E[YU'])~^E[YY'](E[UY'])"^  -   (E[UU' l)"'^ 

=  (E[YU'])"^{E[YY']   -  E[YU'](E[UU'])~^E[UY']}(E[UY'])~\ 


that  is  positive  semi-definite  by  the  matrix  in  the  angle  brackets  being  positive  semi- 
definite. 

For  the  third  conclusion,   note  that  similar  to  the  proof  of  Theorem  3.3, 

a  =  y.     l(v.>--g)/n  — ^  E[l(v>-£)]  =  a,     while     a-d         (a-d     ^)     is  the  upper  left  element 
^1=1      1  m  m5 

of     Q     (Q_),      and     a-d        the  upper  left  element  of     Q.      Then     a-d      — ^  a«d        and 
o  m  mm 

a«d        — >  a-d        follow  as  in  the  proof  of  Theorem  3.3,   except  that  here  the  sixth  moments 

of     X     are  not  needed  to  exist  by  virtue  of  only  requiring  convergence  of  the  upper  left 

element  of     Q^.      Q.E.D.. 
o 
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